智能论文笔记

Self-Supervised Contrastive Representation Learning for 3D Mesh Segmentation

Ayaan Haque , Hankyu Moon , Heng Hao , Sima Didari , Jae Oh Woo , Patrick Bangert

分类：计算机视觉 | 机器学习

2022-08-08

由于3D格式存储的大量信息，3D深度学习是一个越来越多的感兴趣领域。三角形网格是不规则，不均匀3D对象的有效表示。但是，由于其高几何复杂性，网格通常具有挑战性的注释。具体而言，为网格创建细分面具是乏味且耗时的。因此，希望使用有限标记的数据训练分割网络。自我监督的学习（SSL）是一种无监督的表示学习的一种形式，它是对完全监督学习的替代方法，可以减轻监督的培训负担。我们提出了SSL-MESHCNN，这是一种用于网格分割的预训练CNN的自我监督的对比学习方法。我们从传统的对比学习框架中汲取灵感来设计专门针对网格的新颖对比度学习算法。我们的初步实验显示了将网状分割所需的重型标记数据需求减少至少33％的有希望的结果。

translated by 谷歌翻译

Analytic Mutual Information in Bayesian Neural Networks

Jae Oh Woo

分类：机器学习

2022-01-24

贝叶斯神经网络在许多应用程序问题（包括不确定性量化）中成功设计和优化了强大的神经网络模型。但是，随着最近的成功，对贝叶斯神经网络的信息理论理解仍处于早期阶段。相互信息是贝叶斯神经网络中一种不确定性度量的示例，以量化认知不确定性。尽管如此，尚无分析公式来描述它，这是了解贝叶斯深度学习框架的基本信息指标之一。在本文中，我们通过利用点过程熵的概念来得出模型参数和预测输出之间相互信息的分析公式。然后，作为应用程序，我们通过证明我们的分析公式可以在实践中进一步提高主动学习的性能，从而讨论DIRICHLET分布的参数估计，并显示其在主动学习不确定性度量中的实际应用。

translated by 谷歌翻译

Active Learning in Bayesian Neural Networks with Balanced Entropy Learning Principle

Jae Oh Woo

分类：机器学习 | (统计)机器学习

2021-05-30

Acquiring labeled data is challenging in many machine learning applications with limited budgets. Active learning gives a procedure to select the most informative data points and improve data efficiency by reducing the cost of labeling. The info-max learning principle maximizing mutual information such as BALD has been successful and widely adapted in various active learning applications. However, this pool-based specific objective inherently introduces a redundant selection and further requires a high computational cost for batch selection. In this paper, we design and propose a new uncertainty measure, Balanced Entropy Acquisition (BalEntAcq), which captures the information balance between the uncertainty of underlying softmax probability and the label variable. To do this, we approximate each marginal distribution by Beta distribution. Beta approximation enables us to formulate BalEntAcq as a ratio between an augmented entropy and the marginalized joint entropy. The closed-form expression of BalEntAcq facilitates parallelization by estimating two parameters in each marginal Beta distribution. BalEntAcq is a purely standalone measure without requiring any relational computations with other data points. Nevertheless, BalEntAcq captures a well-diversified selection near the decision boundary with a margin, unlike other existing uncertainty measures such as BALD, Entropy, or Mean Standard Deviation (MeanSD). Finally, we demonstrate that our balanced entropy learning principle with BalEntAcq consistently outperforms well-known linearly scalable active learning methods, including a recently proposed PowerBALD, a simple but diversified version of BALD, by showing experimental results obtained from MNIST, CIFAR-100, SVHN, and TinyImageNet datasets.

translated by 谷歌翻译

Highly Efficient Representation and Active Learning Framework and Its Application to Imbalanced Medical Image Classification

Heng Hao , Hankyu Moon , Sima Didari , Jae Oh Woo , Patrick Bangert

分类：计算机视觉 | 机器学习

2021-02-25

我们为图像分类提出了一个高度数据效率的主动学习框架。我们的新框架结合了：（1）卷积神经网络的无监督表示学习和（2）Gaussian Process（GP）方法，以实现高度数据和标记有效分类。此外，由于没有标签和（2）GP的贝叶斯性质所学的（1）功能，这两个元素对普遍且具有挑战性的阶级不平衡问题的敏感性不太敏感。 GP提供的不确定性估计可以通过根据不确定性对样本进行排名和选择性标记样品来表现出较高的不确定性，从而实现主动学习。我们将这种新颖的组合应用于Covid-19胸部X射线分类和Nerthus结肠镜检查分类的严重不平衡病例。我们只证明这一点。需要10％的标记数据来达到培训所有可用标签的准确性。我们还将模型架构和建议的框架应用于具有预期成功的更广泛的数据集。

translated by 谷歌翻译

Tracking by Associating Clips

Sanghyun Woo , Kwanyong Park , Seoung Wug Oh , In So Kweon , Joon-Young Lee

分类：计算机视觉

2022-12-20

The tracking-by-detection paradigm today has become the dominant method for multi-object tracking and works by detecting objects in each frame and then performing data association across frames. However, its sequential frame-wise matching property fundamentally suffers from the intermediate interruptions in a video, such as object occlusions, fast camera movements, and abrupt light changes. Moreover, it typically overlooks temporal information beyond the two frames for matching. In this paper, we investigate an alternative by treating object association as clip-wise matching. Our new perspective views a single long video sequence as multiple short clips, and then the tracking is performed both within and between the clips. The benefits of this new approach are two folds. First, our method is robust to tracking error accumulation or propagation, as the video chunking allows bypassing the interrupted frames, and the short clip tracking avoids the conventional error-prone long-term track memory management. Second, the multiple frame information is aggregated during the clip-wise matching, resulting in a more accurate long-range track association than the current frame-wise matching. Given the state-of-the-art tracking-by-detection tracker, QDTrack, we showcase how the tracking performance improves with our new tracking formulation. We evaluate our proposals on two tracking benchmarks, TAO and MOT17 that have complementary characteristics and challenges each other.

translated by 谷歌翻译

Bridging Images and Videos: A Simple Learning Framework for Large Vocabulary Video Object Detection

Sanghyun Woo , Kwanyong Park , Seoung Wug Oh , In So Kweon , Joon-Young Lee

分类：计算机视觉

2022-12-20

Scaling object taxonomies is one of the important steps toward a robust real-world deployment of recognition systems. We have faced remarkable progress in images since the introduction of the LVIS benchmark. To continue this success in videos, a new video benchmark, TAO, was recently presented. Given the recent encouraging results from both detection and tracking communities, we are interested in marrying those two advances and building a strong large vocabulary video tracker. However, supervisions in LVIS and TAO are inherently sparse or even missing, posing two new challenges for training the large vocabulary trackers. First, no tracking supervisions are in LVIS, which leads to inconsistent learning of detection (with LVIS and TAO) and tracking (only with TAO). Second, the detection supervisions in TAO are partial, which results in catastrophic forgetting of absent LVIS categories during video fine-tuning. To resolve these challenges, we present a simple but effective learning framework that takes full advantage of all available training data to learn detection and tracking while not losing any LVIS categories to recognize. With this new learning scheme, we show that consistent improvements of various large vocabulary trackers are capable, setting strong baseline results on the challenging TAO benchmarks.

translated by 谷歌翻译

Deep Unsupervised Domain Adaptation: A Review of Recent Advances and Perspectives

Xiaofeng Liu , Chaehwa Yoo , Fangxu Xing , Hyejin Oh , Georges El Fakhri , Je-Won Kang , Jonghye Woo

分类：计算机视觉 | 人工智能 | 机器学习

2022-08-15

深度学习已成为解决不同领域中现实世界中问题的首选方法，部分原因是它能够从数据中学习并在广泛的应用程序上实现令人印象深刻的性能。但是，它的成功通常取决于两个假设：（i）精确模型拟合需要大量标记的数据集，并且（ii）培训和测试数据是独立的且分布相同的。因此，不能保证它在看不见的目标域上的性能，尤其是在适应阶段遇到分布数据的数据时。目标域中数据的性能下降是部署深层神经网络的关键问题，这些网络已成功地在源域中的数据训练。通过利用标记的源域数据和未标记的目标域数据来执行目标域中的各种任务，提出了无监督的域适应（UDA）来对抗这一点。 UDA在自然图像处理，视频分析，自然语言处理，时间序列数据分析，医学图像分析等方面取得了令人鼓舞的结果。在本综述中，作为一个快速发展的主题，我们对其方法和应用程序进行了系统的比较。此外，还讨论了UDA与其紧密相关的任务的联系，例如域的概括和分布外检测。此外，突出显示了当前方法和可能有希望的方向的缺陷。

translated by 谷歌翻译

Diverse Generative Adversarial Perturbations on Attention Space for Transferable Adversarial Attacks

Woo Jae Kim , Seunghoon Hong , Sung-Eui Yoon

分类：计算机视觉

2022-08-11

具有提高可传递性的对抗性攻击 - 在已知模型上精心制作的对抗性示例的能力也欺骗了未知模型 - 由于其实用性，最近受到了很多关注。然而，现有的可转移攻击以确定性的方式制作扰动，并且常常无法完全探索损失表面，从而陷入了贫穷的当地最佳最佳效果，并且遭受了低传递性的折磨。为了解决这个问题，我们提出了细心多样性攻击（ADA），该攻击以随机方式破坏了不同的显着特征以提高可转移性。首先，我们将图像注意力扰动到破坏不同模型共享的通用特征。然后，为了有效避免局部优势差，我们以随机方式破坏了这些功能，并更加详尽地探索可转移扰动的搜索空间。更具体地说，我们使用发电机来产生对抗性扰动，每个扰动都根据输入潜在代码而以不同的方式打扰。广泛的实验评估证明了我们方法的有效性，优于最先进方法的可转移性。代码可在https://github.com/wkim97/ada上找到。

translated by 谷歌翻译

Per-Clip Video Object Segmentation

Kwanyong Park , Sanghyun Woo , Seoung Wug Oh , In So Kweon , Joon-Young Lee

分类：计算机视觉

2022-08-03

最近，基于内存的方法显示了半监督视频对象分割的有希望的结果。这些方法可以通过对先前掩码的经常更新的内存来预测对象蒙版逐帧。与这种人均推断不同，我们通过将视频对象分割视为夹子掩盖传播来研究替代角度。在此每次CLIP推断方案中，我们使用一个间隔更新内存，并同时处理内存更新之间的一组连续帧（即剪辑）。该方案提供了两个潜在的好处：通过剪辑级优化和效率增益的准确性增益，通过平行计算多个帧。为此，我们提出了一种针对人均推理量身定制的新方法。具体而言，我们首先引入夹具操作，以根据CLIP相关性来完善特征。此外，我们采用了一种渐进匹配机制来在剪辑中有效地通过信息通行。通过两个模块的协同作用和新提议的每盘培训，我们的网络在YouTube-Vos 2018/2019 Val（84.6％和84.6％）和Davis 2016/2017 Val（91.9 Val（91.9 ％和86.1％）。此外，我们的模型在不同的内存更新间隔内显示出巨大的速度准确性权衡取舍，从而带来了巨大的灵活性。

translated by 谷歌翻译

UniHPF : Universal Healthcare Predictive Framework with Zero Domain Knowledge

Kyunghoon Hur , Jungwoo Oh , Junu Kim , Min Jae Lee , Eunbyeol Choi , Jiyoun Kim , Seong-Eun Moon , Young-Hak Kim , Edward Choi

分类：机器学习

2022-07-20

尽管电子保健记录（EHR）丰富，但其异质性限制了医疗数据在构建预测模型中的利用。为了应对这一挑战，我们提出了通用医疗预测框架（UNIHPF），该框架不需要医疗领域知识和对多个预测任务的最小预处理。实验结果表明，UNIHPF能够构建可以从不同EHR系统处理任何形式的医疗数据的大规模EHR模型。我们的框架在多源学习任务（包括转移和汇总学习）中大大优于基线模型，同时在单个医疗数据集中接受培训时也会显示出可比的结果。为了凭经验证明我们工作的功效，我们使用各种数据集，模型结构和任务进行了广泛的实验。我们认为，我们的发现可以为对EHR的多源学习提供进一步研究提供有益的见解。

translated by 谷歌翻译